1. Data

Load the student scores for the test - here we load the ETH Zurich test data, downloaded from https://pontifex.ethz.ch/s21t5/rate/

Show data summary

skim(test_scores)
Data summary
Name test_scores
Number of rows 9671
Number of columns 44
_______________________
Column type frequency:
character 3
numeric 41
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
test_version 0 1 3 4 0 2 0
year 0 1 4 4 0 5 0
class 0 1 13 13 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
A1_B1 0 1.00 0.97 0.21 0 1 1 1 2 <U+2581><U+2581><U+2587><U+2581><U+2581>
A2_B0 6238 0.35 0.43 0.62 0 0 0 1 2 <U+2587><U+2581><U+2583><U+2581><U+2581>
A3_B2 0 1.00 0.70 0.51 0 0 1 1 2 <U+2583><U+2581><U+2587><U+2581><U+2581>
A4_B3 0 1.00 0.66 0.51 0 0 1 1 2 <U+2585><U+2581><U+2587><U+2581><U+2581>
A5_B4 0 1.00 1.03 0.72 0 1 1 2 2 <U+2583><U+2581><U+2587><U+2581><U+2585>
A6_B5 0 1.00 1.06 0.51 0 1 1 1 2 <U+2581><U+2581><U+2587><U+2581><U+2582>
A7_B6 0 1.00 0.72 0.47 0 0 1 1 2 <U+2583><U+2581><U+2587><U+2581><U+2581>
A8_B7 0 1.00 0.80 0.65 0 0 1 1 2 <U+2585><U+2581><U+2587><U+2581><U+2582>
A9_B8 0 1.00 1.08 0.74 0 1 1 2 2 <U+2585><U+2581><U+2587><U+2581><U+2586>
A10_B9 0 1.00 1.01 0.68 0 1 1 1 2 <U+2583><U+2581><U+2587><U+2581><U+2583>
A11_B10 0 1.00 0.93 0.64 0 1 1 1 2 <U+2583><U+2581><U+2587><U+2581><U+2582>
A12_B0 6238 0.35 0.90 0.54 0 1 1 1 2 <U+2582><U+2581><U+2587><U+2581><U+2581>
A13_B0 6238 0.35 0.60 0.69 0 0 0 1 2 <U+2587><U+2581><U+2586><U+2581><U+2582>
A14_B12 0 1.00 0.81 0.64 0 0 1 1 2 <U+2585><U+2581><U+2587><U+2581><U+2582>
A15_B13 0 1.00 0.74 0.69 0 0 1 1 2 <U+2587><U+2581><U+2587><U+2581><U+2582>
A16_B14 0 1.00 0.61 0.80 0 0 0 1 2 <U+2587><U+2581><U+2582><U+2581><U+2583>
A17_B15 0 1.00 0.73 0.58 0 0 1 1 2 <U+2585><U+2581><U+2587><U+2581><U+2581>
A18_B16 0 1.00 0.78 0.80 0 0 1 1 2 <U+2587><U+2581><U+2586><U+2581><U+2585>
A19_B0 6238 0.35 0.95 0.81 0 0 1 2 2 <U+2587><U+2581><U+2587><U+2581><U+2587>
A20_B17 0 1.00 0.83 0.50 0 1 1 1 2 <U+2582><U+2581><U+2587><U+2581><U+2581>
A21_B18 0 1.00 0.92 0.69 0 0 1 1 2 <U+2585><U+2581><U+2587><U+2581><U+2583>
A22_B19 0 1.00 1.00 0.70 0 1 1 1 2 <U+2583><U+2581><U+2587><U+2581><U+2583>
A23_B20 0 1.00 0.90 0.75 0 0 1 1 2 <U+2586><U+2581><U+2587><U+2581><U+2585>
A24_B21 0 1.00 0.89 0.64 0 0 1 1 2 <U+2583><U+2581><U+2587><U+2581><U+2582>
A25_B0 6238 0.35 0.72 0.53 0 0 1 1 2 <U+2585><U+2581><U+2587><U+2581><U+2581>
A26_B22 0 1.00 1.07 0.44 0 1 1 1 2 <U+2581><U+2581><U+2587><U+2581><U+2582>
A27_B23 0 1.00 1.02 0.77 0 0 1 2 2 <U+2586><U+2581><U+2587><U+2581><U+2586>
A28_B24 0 1.00 1.08 0.75 0 1 1 2 2 <U+2585><U+2581><U+2587><U+2581><U+2586>
A29_B25 0 1.00 0.88 0.70 0 0 1 1 2 <U+2585><U+2581><U+2587><U+2581><U+2583>
A30_B0 6238 0.35 0.90 0.50 0 1 1 1 2 <U+2582><U+2581><U+2587><U+2581><U+2581>
A31_B27 0 1.00 0.63 0.69 0 0 1 1 2 <U+2587><U+2581><U+2586><U+2581><U+2582>
A32_B28 0 1.00 0.70 0.84 0 0 0 1 2 <U+2587><U+2581><U+2583><U+2581><U+2583>
A33_B29 0 1.00 0.83 0.46 0 1 1 1 2 <U+2582><U+2581><U+2587><U+2581><U+2581>
A34_B0 6238 0.35 1.14 0.62 0 1 1 2 2 <U+2582><U+2581><U+2587><U+2581><U+2583>
A35_B0 6238 0.35 1.31 0.76 0 1 1 2 2 <U+2583><U+2581><U+2585><U+2581><U+2587>
A36_B0 6238 0.35 1.14 0.87 0 0 1 2 2 <U+2586><U+2581><U+2583><U+2581><U+2587>
A0_B11 3433 0.65 0.70 0.57 0 0 1 1 2 <U+2585><U+2581><U+2587><U+2581><U+2581>
A0_B26 3433 0.65 0.80 0.65 0 0 1 1 2 <U+2585><U+2581><U+2587><U+2581><U+2582>
A0_B30 3433 0.65 1.04 0.63 0 1 1 1 2 <U+2582><U+2581><U+2587><U+2581><U+2583>
A0_B31 3433 0.65 0.96 0.79 0 0 1 2 2 <U+2587><U+2581><U+2587><U+2581><U+2586>
A0_B32 3433 0.65 1.37 0.81 0 1 2 2 2 <U+2583><U+2581><U+2583><U+2581><U+2587>

The scores are:

  • 0 if the student gave an incorrect response
  • 1 if the student gave a correct response
  • 2 if the student chose the “I don’t know” answer option
  • NA if there was no response recorded

For this analysis, we replace the “2 = I don’t know” scores with 0, reflecting a non-correct answer.

test_scores <- test_scores %>% 
  mutate(across(starts_with("A"), ~ ifelse(. == 2, 0, .)))

Data summary

The number of responses from each cohort:

test_scores %>% 
  group_by(year) %>% 
  tally() %>% 
  gt() %>% 
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
year n
2017 1682
2018 1751
2019 1999
2020 2191
2021 2048

Mean and standard deviation for each item (note this table is very wide - see the scroll bar at the bottom!):

test_scores %>% 
  select(-class, -test_version) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  rename(complete = complete_rate) %>% 
  # make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
  pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>% 
  # put the columns in order by year
  select(sort(names(.))) %>% 
  select(skim_variable, everything()) %>% 
  # use GT to make the table look nice
  gt(rowname_col = "skim_variable") %>% 
  # group the columns from each year
  tab_spanner_delim(delim = "__") %>%
  fmt_number(columns = contains("numeric"), decimals = 2) %>%
  fmt_percent(columns = contains("complete"), decimals = 0) %>% 
  # change all the numeric.mean and numeric.sd column names to Mean and SD
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
  ) %>% 
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
  ) %>%
  data_color(
    columns = contains("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
2017 2018 2019 2020 2021
complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD
A1_B1 100% 0 0.95 0.22 100% 0 0.95 0.21 100% 0 0.96 0.21 100% 0 0.96 0.20 100% 0 0.95 0.21
A2_B0 100% 0 0.31 0.46 100% 0 0.28 0.45 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A3_B2 100% 0 0.62 0.48 100% 0 0.64 0.48 100% 0 0.67 0.47 100% 0 0.66 0.48 100% 0 0.67 0.47
A4_B3 100% 0 0.64 0.48 100% 0 0.62 0.48 100% 0 0.64 0.48 100% 0 0.62 0.48 100% 0 0.60 0.49
A5_B4 100% 0 0.47 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50
A6_B5 100% 0 0.71 0.46 100% 0 0.73 0.44 100% 0 0.74 0.44 100% 0 0.76 0.42 100% 0 0.74 0.44
A7_B6 100% 0 0.71 0.46 100% 0 0.70 0.46 100% 0 0.69 0.46 100% 0 0.70 0.46 100% 0 0.68 0.47
A8_B7 100% 0 0.47 0.50 100% 0 0.50 0.50 100% 0 0.52 0.50 100% 0 0.60 0.49 100% 0 0.58 0.49
A9_B8 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.43 0.50 100% 0 0.42 0.49
A10_B9 100% 0 0.54 0.50 100% 0 0.54 0.50 100% 0 0.55 0.50 100% 0 0.55 0.50 100% 0 0.53 0.50
A11_B10 100% 0 0.57 0.49 100% 0 0.58 0.49 100% 0 0.58 0.49 100% 0 0.59 0.49 100% 0 0.57 0.50
A12_B0 100% 0 0.68 0.47 100% 0 0.71 0.45 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A13_B0 100% 0 0.37 0.48 100% 0 0.36 0.48 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A14_B12 100% 0 0.56 0.50 100% 0 0.57 0.49 100% 0 0.59 0.49 100% 0 0.54 0.50 100% 0 0.54 0.50
A15_B13 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.47 0.50 100% 0 0.46 0.50 100% 0 0.45 0.50
A16_B14 100% 0 0.19 0.39 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.20 0.40
A17_B15 100% 0 0.59 0.49 100% 0 0.60 0.49 100% 0 0.60 0.49 100% 0 0.59 0.49 100% 0 0.57 0.49
A18_B16 100% 0 0.38 0.49 100% 0 0.36 0.48 100% 0 0.28 0.45 100% 0 0.27 0.45 100% 0 0.28 0.45
A19_B0 100% 0 0.35 0.48 100% 0 0.34 0.47 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A20_B17 100% 0 0.72 0.45 100% 0 0.72 0.45 100% 0 0.74 0.44 100% 0 0.71 0.46 100% 0 0.71 0.45
A21_B18 100% 0 0.51 0.50 100% 0 0.52 0.50 100% 0 0.54 0.50 100% 0 0.52 0.50 100% 0 0.51 0.50
A22_B19 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.50 0.50
A23_B20 100% 0 0.41 0.49 100% 0 0.43 0.50 100% 0 0.42 0.49 100% 0 0.42 0.49 100% 0 0.42 0.49
A24_B21 100% 0 0.59 0.49 100% 0 0.58 0.49 100% 0 0.60 0.49 100% 0 0.56 0.50 100% 0 0.57 0.50
A25_B0 100% 0 0.54 0.50 100% 0 0.73 0.44 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A26_B22 100% 0 0.80 0.40 100% 0 0.80 0.40 100% 0 0.80 0.40 100% 0 0.81 0.39 100% 0 0.80 0.40
A27_B23 100% 0 0.39 0.49 100% 0 0.40 0.49 100% 0 0.42 0.49 100% 0 0.40 0.49 100% 0 0.41 0.49
A28_B24 100% 0 0.42 0.49 100% 0 0.46 0.50 100% 0 0.45 0.50 100% 0 0.42 0.49 100% 0 0.41 0.49
A29_B25 100% 0 0.48 0.50 100% 0 0.52 0.50 100% 0 0.51 0.50 100% 0 0.49 0.50 100% 0 0.50 0.50
A30_B0 100% 0 0.76 0.43 100% 0 0.73 0.44 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A31_B27 100% 0 0.38 0.48 100% 0 0.37 0.48 100% 0 0.39 0.49 100% 0 0.40 0.49 100% 0 0.40 0.49
A32_B28 100% 0 0.23 0.42 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.21 0.41 100% 0 0.20 0.40
A33_B29 100% 0 0.78 0.42 100% 0 0.76 0.43 100% 0 0.77 0.42 100% 0 0.74 0.44 100% 0 0.75 0.43
A34_B0 100% 0 0.58 0.49 100% 0 0.61 0.49 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A35_B0 100% 0 0.32 0.47 100% 0 0.33 0.47 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A36_B0 100% 0 0.22 0.41 100% 0 0.23 0.42 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A0_B11 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.57 0.50 100% 0 0.59 0.49 100% 0 0.58 0.49
A0_B26 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.54 0.50 100% 0 0.53 0.50 100% 0 0.52 0.50
A0_B30 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.61 0.49 100% 0 0.61 0.49 100% 0 0.60 0.49
A0_B31 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.35 0.48 100% 0 0.33 0.47 100% 0 0.45 0.50
A0_B32 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.20 0.40 100% 0 0.21 0.41 100% 0 0.21 0.41

2. Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

Since the combined data on the two versions of the test have large portions of “missing” data (e.g. no responses to new questions from students who completed the old test), it is not possible to carry out the factor analysis as in the analyse-test script, since the factor analysis routine does not work with missing data.

Instead, in the next section we proceed directly to fitting the IRT model, and using the \(Q_3\) check for local independence. In the final section, we also run a factor analysis for the data from the new test only.

3. Fitting 2 parameter logistic MIRT model

We use the mirt package to fit an item response theory model.

For this analysis, we use the 2 parameter logistic (2PL) model.

Show model fitting output

item_scores <- test_scores %>% 
  select(-class, -year, -test_version)

fit_2pl <- mirt(
  data = item_scores, # just the columns with question scores
  #removeEmptyRows = TRUE, 
  model = 1,          # number of factors to extract
  itemtype = "2PL",   # 2 parameter logistic model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -186638.319, Max-Change: 4.61593
Iteration: 2, Log-Lik: -173222.852, Max-Change: 3.34257
Iteration: 3, Log-Lik: -172025.296, Max-Change: 0.75461
Iteration: 4, Log-Lik: -171559.893, Max-Change: 0.30299
Iteration: 5, Log-Lik: -171310.677, Max-Change: 0.29565
Iteration: 6, Log-Lik: -171163.155, Max-Change: 0.15660
Iteration: 7, Log-Lik: -171055.541, Max-Change: 0.15756
Iteration: 8, Log-Lik: -170980.492, Max-Change: 0.10202
Iteration: 9, Log-Lik: -170924.986, Max-Change: 0.13257
Iteration: 10, Log-Lik: -170881.349, Max-Change: 0.08812
Iteration: 11, Log-Lik: -170846.140, Max-Change: 0.07497
Iteration: 12, Log-Lik: -170823.782, Max-Change: 0.04927
Iteration: 13, Log-Lik: -170807.153, Max-Change: 0.06731
Iteration: 14, Log-Lik: -170793.659, Max-Change: 0.03988
Iteration: 15, Log-Lik: -170783.377, Max-Change: 0.04419
Iteration: 16, Log-Lik: -170775.383, Max-Change: 0.02498
Iteration: 17, Log-Lik: -170768.350, Max-Change: 0.03335
Iteration: 18, Log-Lik: -170762.835, Max-Change: 0.01562
Iteration: 19, Log-Lik: -170757.759, Max-Change: 0.02149
Iteration: 20, Log-Lik: -170753.836, Max-Change: 0.01406
Iteration: 21, Log-Lik: -170750.152, Max-Change: 0.01118
Iteration: 22, Log-Lik: -170744.133, Max-Change: 0.01296
Iteration: 23, Log-Lik: -170741.889, Max-Change: 0.01003
Iteration: 24, Log-Lik: -170740.015, Max-Change: 0.00919
Iteration: 25, Log-Lik: -170732.968, Max-Change: 0.00538
Iteration: 26, Log-Lik: -170732.507, Max-Change: 0.00457
Iteration: 27, Log-Lik: -170732.168, Max-Change: 0.00410
Iteration: 28, Log-Lik: -170730.815, Max-Change: 0.00252
Iteration: 29, Log-Lik: -170730.746, Max-Change: 0.00278
Iteration: 30, Log-Lik: -170730.686, Max-Change: 0.00187
Iteration: 31, Log-Lik: -170730.559, Max-Change: 0.00125
Iteration: 32, Log-Lik: -170730.538, Max-Change: 0.00104
Iteration: 33, Log-Lik: -170730.519, Max-Change: 0.00099
Iteration: 34, Log-Lik: -170730.435, Max-Change: 0.00068
Iteration: 35, Log-Lik: -170730.427, Max-Change: 0.00035
Iteration: 36, Log-Lik: -170730.424, Max-Change: 0.00018
Iteration: 37, Log-Lik: -170730.419, Max-Change: 0.00019
Iteration: 38, Log-Lik: -170730.417, Max-Change: 0.00018
Iteration: 39, Log-Lik: -170730.416, Max-Change: 0.00016
Iteration: 40, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 41, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 42, Log-Lik: -170730.407, Max-Change: 0.00013
Iteration: 43, Log-Lik: -170730.403, Max-Change: 0.00056
Iteration: 44, Log-Lik: -170730.401, Max-Change: 0.00046
Iteration: 45, Log-Lik: -170730.399, Max-Change: 0.00046
Iteration: 46, Log-Lik: -170730.398, Max-Change: 0.00029
Iteration: 47, Log-Lik: -170730.397, Max-Change: 0.00011
Iteration: 48, Log-Lik: -170730.397, Max-Change: 0.00005
## 
## Calculating information matrix...

Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

residuals_2pl  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only a couple of pairs showing cause for concern:

residuals_2pl %>%
  rownames_to_column(var = "item1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("A"), names_to = "item2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(item1) < parse_number(item2)) %>%
  gt()
item1 item2 Q3_score
A18_B16 A19_B0 0.6739582
A34_B0 A35_B0 0.2105944

In fact, both of these pairs highlight questions that were removed from the test:

  • A18 and A19 are on the product and quotient rules, and only A18 was retained on the new test,

  • A34 and A35 are both about vectors; only A34 was retained (in modified form, as B30)

Given that this violation of the local independence assumption is very mild, we proceed using this model.

Model parameters

We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default.

test_scores <- test_scores %>%
  mutate(F1 = fscores(fit_2pl, method = "EAP"))

We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.

coefs_2pl <- coef(fit_2pl, IRTpars = TRUE)

The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. For each question, the object records several values:

  • a is discrimination
  • b is difficulty
  • endpoints of the 95% confidence intervals are also shown

To make this output a little more user friendly, we use the tidy_mirt_coefs function that we have provided in common-functions.R, to produce a single data frame with a row for each question.

source("common-functions.R")
# use head(., -1) to remove the last element, `GroupPars`, which does not correspond to a question
tidy_2pl <- map_dfr(head(coefs_2pl, -1), tidy_mirt_coefs, .id = "Question") %>% 
  left_join(test_versions, by = c("Question" = "label"))

Here is a nicely formatted table of the result:

tidy_2pl %>%
  select(-pre,-post,-notes) %>% 
  group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("a_"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("b_"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = contains("b_")) %>%
  cols_label(
    a_est = "Est.",
    b_est = "Est.",
    a_CI_2.5 = "2.5%",
    b_CI_2.5 = "2.5%",
    a_CI_97.5 = "97.5%",
    b_CI_97.5 = "97.5%"
  )
Question Discrimination Difficulty description item_num
Est. 2.5% 97.5% Est. 2.5% 97.5%
unchanged
A1_B1 1.26 1.14 1.38 −2.92 −3.13 −2.71 linear function 1
A3_B2 1.29 1.23 1.36 −0.64 −0.69 −0.60 arithmetic rules 3
A4_B3 1.15 1.09 1.21 −0.57 −0.62 −0.52 Easy ineq. 4
A5_B4 1.43 1.35 1.50 0.05 0.01 0.09 logs 5
A6_B5 1.51 1.43 1.60 −0.97 −1.02 −0.92 logs 6
A7_B6 1.37 1.30 1.44 −0.82 −0.87 −0.77 graph translation 7
A8_B7 1.18 1.11 1.24 −0.18 −0.22 −0.13 sine pi/3 8
A9_B8 1.98 1.89 2.08 0.17 0.14 0.20 trig.ineq. 9
A10_B9 1.74 1.66 1.83 −0.16 −0.19 −0.12 trig.identity 10
A11_B10 1.61 1.53 1.69 −0.30 −0.34 −0.26 period 11
A14_B12 1.47 1.40 1.55 −0.23 −0.27 −0.19 limit 14
A15_B13 0.83 0.78 0.88 0.21 0.16 0.27 series 15
A16_B14 1.41 1.32 1.49 1.34 1.28 1.41 diff.quotient 16
A17_B15 1.22 1.15 1.28 −0.39 −0.43 −0.34 graph f' 17
A18_B16 1.08 1.02 1.15 0.91 0.85 0.97 product rule 18
A20_B17 2.18 2.06 2.29 −0.75 −0.78 −0.71 (exp)' 20
A21_B18 2.10 2.00 2.20 −0.07 −0.10 −0.03 (ln(sin))' 21
A22_B19 1.99 1.90 2.09 −0.05 −0.08 −0.01 hyp.functions 22
A23_B20 1.94 1.85 2.03 0.26 0.23 0.30 slope tangent 23
A24_B21 1.41 1.34 1.49 −0.31 −0.35 −0.27 IVT 24
A26_B22 2.38 2.25 2.51 −1.06 −1.10 −1.02 int(poly) 26
A27_B23 1.89 1.80 1.99 0.32 0.29 0.36 int(exp) 27
A28_B24 1.85 1.76 1.94 0.23 0.20 0.27 Int = 0 28
A29_B25 0.83 0.78 0.88 0.00 −0.05 0.06 int even funct 29
A31_B27 1.29 1.23 1.36 0.45 0.40 0.49 int(abs) 30
A32_B28 1.52 1.44 1.61 1.22 1.16 1.28 FtoC algebra 31
A33_B29 0.99 0.93 1.06 −1.37 −1.46 −1.29 difference vectors 32
removed
A2_B0 0.55 0.47 0.64 1.69 1.43 1.96 3D 2
A12_B0 0.88 0.78 0.97 −1.11 −1.24 −0.98 rational exponents 12
A13_B0 0.71 0.62 0.79 0.86 0.73 1.00 ellipsoid 13
A19_B0 1.27 1.16 1.38 0.65 0.58 0.73 quotient rule 19
A25_B0 0.53 0.45 0.61 −1.14 −1.35 −0.94 velocity 25
A30_B0 1.25 1.13 1.37 −1.10 −1.20 −1.00 FtoC graph 35
A34_B0 0.89 0.80 0.99 −0.52 −0.62 −0.43 normal vector 36
A35_B0 1.26 1.15 1.37 0.74 0.66 0.82 intersect planes 33
A36_B0 1.23 1.11 1.35 1.30 1.19 1.41 parallel planes 34
added
A0_B11 0.77 0.70 0.84 −0.48 −0.56 −0.40 rational exponents 37
A0_B26 1.08 1.01 1.16 −0.15 −0.20 −0.09 FtoC graph 40
A0_B30 0.81 0.75 0.88 −0.59 −0.67 −0.51 normal vector 41
A0_B31 1.01 0.94 1.09 0.62 0.55 0.69 vector product 38
A0_B32 1.32 1.22 1.41 1.32 1.24 1.41 scalar product 39

These values are also saved to the file data-eth/OUT_2pl-results.csv.

tidy_2pl %>% 
  write_csv("output/eth_post_2pl-results.csv")

This shows the difficulty and discrimination of each of the questions, highlighting those that were added or removed:

tidy_2pl %>% 
  mutate(qnum = parse_number(Question)) %>% 
  ggplot(aes(
    x = a_est,
    y = b_est,
    label = case_when(
      outcome == "unchanged" ~ "",
      outcome == "removed" ~ pre,
      outcome == "added" ~ post
    ),
    colour = outcome
  )) +
  geom_errorbar(aes(ymin = b_CI_2.5, ymax = b_CI_97.5), width = 0, alpha = 0.5) +
  geom_errorbar(aes(xmin = a_CI_2.5, xmax = a_CI_97.5), width = 0, alpha = 0.5) +
  geom_text_repel() +
  geom_point() +
  theme_minimal() +
  labs(x = "Discrimination",
       y = "Difficulty")

Comparing years and classes

Do students from different programmes of study have different distributions of ability?

Differences between years

Compare the distribution of abilities in the year groups (though in this case there is only one).

ggplot(test_scores, aes(F1, y = year, fill = as.factor(year), colour = as.factor(year))) +
  geom_density_ridges(alpha=0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by year of taking the test", 
       x = "Ability", y = "Density",
       fill = "Year", colour = "Year") +
  theme_minimal()
## Picking joint bandwidth of 0.189

Information curves

Test information curve

plot(fit_2pl, type = "infoSE", main = "Test information")

Item information curves

plot(fit_2pl, type = "infotrace", main = "Item information curves")

Total information

Using mirt’s areainfo function, we can find the total area under the information curves.

all_items = test_versions %>% filter(!is.na(pre) | !is.na(post)) %>% pull(item_num) # amend the filter to consider different subsets of items
info_2pl <- areainfo(fit_2pl, c(-4,4), which.items = all_items)
info_2pl %>% gt()
LowerBound UpperBound Info TotalInfo Proportion nitems
-4 4 52.88075 54.46439 0.9709234 41

This shows that the total information in all items (from both versions A and B of the test) is 54.4643857.

Information by item

tidy_info <- test_versions %>%
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(fit_2pl,
               c(-4, 4),
               which.items = .x) %>% pull(TotalInfo)
  ))

tidy_info %>%
  select(label, TotalInfo, description, outcome) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
label Information description outcome
A26_B22 2.3802120 int(poly) unchanged
A20_B17 2.1756893 (exp)' unchanged
A21_B18 2.0980069 (ln(sin))' unchanged
A22_B19 1.9933525 hyp.functions unchanged
A9_B8 1.9831140 trig.ineq. unchanged
A23_B20 1.9408823 slope tangent unchanged
A27_B23 1.8936276 int(exp) unchanged
A28_B24 1.8527290 Int = 0 unchanged
A10_B9 1.7433688 trig.identity unchanged
A11_B10 1.6077264 period unchanged
A33_B29 1.5215297 difference vectors unchanged
A6_B5 1.5147947 logs unchanged
A14_B12 1.4742547 limit unchanged
A5_B4 1.4257166 logs unchanged
A24_B21 1.4143219 IVT unchanged
A16_B14 1.4056184 diff.quotient unchanged
A7_B6 1.3700318 graph translation unchanged
A0_B30 1.3176397 normal vector added
A3_B2 1.2944966 arithmetic rules unchanged
A32_B28 1.2929143 FtoC algebra unchanged
A19_B0 1.2709396 quotient rule removed
A1_B1 1.2607121 linear function unchanged
A30_B0 1.2590765 FtoC graph removed
A31_B27 1.2511116 int(abs) unchanged
A34_B0 1.2255439 normal vector removed
A17_B15 1.2156068 graph f' unchanged
A8_B7 1.1759131 sine pi/3 unchanged
A4_B3 1.1498403 Easy ineq. unchanged
A0_B31 1.0840039 vector product added
A18_B16 1.0825203 product rule unchanged
A0_B26 1.0127943 FtoC graph added
A35_B0 0.9911898 intersect planes removed
A36_B0 0.8938061 parallel planes removed
A12_B0 0.8758781 rational exponents removed
A29_B25 0.8301225 int even funct unchanged
A15_B13 0.8287906 series unchanged
A0_B32 0.8117541 scalar product added
A0_B11 0.7690007 rational exponents added
A13_B0 0.7074127 ellipsoid removed
A2_B0 0.5473019 3D removed
A25_B0 0.5210397 velocity removed

Groups of questions

Grouping the questions into those that were unchanged, removed and added between the two versions:

versions_info <- purrr::map_dfr(
  c("unchanged", "removed", "added"),
  ~ areainfo(
    fit_2pl,
    c(-4, 4),
    which.items = test_versions %>% filter(outcome == .x) %>% pull(item_num)
  ) %>% mutate(version = .x, .before = 1)
) %>%
  select(-LowerBound,-UpperBound,-Info,-Proportion) %>%
  mutate(mean = TotalInfo / nitems)

versions_info %>%
  mutate(version = paste(str_to_title(version), "questions")) %>% 
  gt() %>%
  cols_label(
    version = "Subset",
    TotalInfo = "Total information",
    nitems = "Number of items",
    mean = "Mean information per item"
  )
Subset Total information Number of items Mean information per item
Unchanged questions 41.177005 27 1.5250743
Removed questions 8.292188 9 0.9213542
Added questions 4.995193 5 0.9990385

Mean information

Thus, while the changes to the test have not increased the overall information in the test (and in fact have slightly decreased it due to there being fewer questions), they have increased the mean information per question on the test:

info_old <- areainfo(fit_2pl,
                     c(-4, 4),
                     which.items = test_versions %>% filter(outcome %in% c("unchanged", "removed")) %>% pull(item_num))
info_new <- areainfo(fit_2pl,
                     c(-4, 4),
                     which.items = test_versions %>% filter(outcome %in% c("unchanged", "added")) %>% pull(item_num))

versions_info <- bind_rows("Version A" = info_old,
                           "Version B" = info_new,
                           .id = "version") %>% 
  select(-LowerBound, -UpperBound, -Info, -Proportion) %>% 
  mutate(mean = TotalInfo / nitems)

versions_info %>% gt() %>%
  cols_label(
    version = "Test version",
    TotalInfo = "Total information",
    nitems = "Number of items",
    mean = "Mean information per item"
  )
Test version Total information Number of items Mean information per item
Version A 49.46919 36 1.374144
Version B 46.17220 32 1.442881

Response curves

Test response curves

plot(fit_2pl, type = "score", auto.key = FALSE)

Item response curves

We can get individual item surface and information plots using the itemplot() function from the mirt package, e.g.

mirt::itemplot(fit_2pl, item = 1, 
               main = "Trace lines for item 1")

We can also get the plots for all trace lines, one facet per plot.

plot(fit_2pl, type = "trace", auto.key = FALSE)

Or all of them overlaid in one plot.

plot(fit_2pl, type = "trace", facet_items=FALSE)

An alternative approach is using ggplot2 and plotly to add interactivity to make it easier to identify items.

# store the object
plt <- plot(fit_2pl, type = "trace", facet_items = FALSE)
# the data we need is in panel.args
# TODO - I had to change the [[1]] to [[2]] since the plt has two panels for some reason, with the one we want being the 2nd panel
plt_data <- tibble(
  x          = plt$panel.args[[2]]$x,
  y          = plt$panel.args[[2]]$y,
  subscripts = plt$panel.args[[2]]$subscripts,
  item       = rep(colnames(item_scores), each = 200)
) %>%
  mutate(
    item_no = str_remove(item, "A") %>% as.numeric(),
    item    = fct_reorder(item, item_no)
    )

head(plt_data)
## # A tibble: 6 x 5
##       x      y subscripts item  item_no
##   <dbl>  <dbl>      <int> <fct>   <dbl>
## 1 -6    0.0201        201 A1_B1      NA
## 2 -5.94 0.0217        202 A1_B1      NA
## 3 -5.88 0.0233        203 A1_B1      NA
## 4 -5.82 0.0251        204 A1_B1      NA
## 5 -5.76 0.0271        205 A1_B1      NA
## 6 -5.70 0.0291        206 A1_B1      NA
plt_gg <- ggplot(plt_data, aes(x, y, 
                          colour = item, 
                          text = item)) + 
  geom_line() + 
  labs(
    title = "2PL - Trace lines",
    #x = expression(theta),
    x = "theta",
    #y = expression(P(theta)),
    y = "P(theta)",
    colour = "Item"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

ggplotly(plt_gg, tooltip = "text")

4. Factor analysis for the new test only

Factor analysis setup

Here we redo the factor analysis, but using only the data from the new version of the test.

item_scores_B <- test_scores %>% 
  select(-F1) %>% 
  select(-contains("B0")) %>% 
  filter(test_version == "post") %>% 
  select(-class, -year, -test_version)

The parameters package provides functions that run various checks to see if the data is suitable for factor analysis, and if so, how many factors should be retained.

structure <- check_factorstructure(item_scores_B)
n <- n_factors(item_scores_B)

Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.97).
    • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(496) = 44502.66, p < .001).

Method Agreement Procedure:

The choice of 1 dimensions is supported by 6 (26.09%) methods out of 23 (Acceleration factor, R2, VSS complexity 1, Velicer’s MAP, TLI, RMSEA).

plot(n)

summary(n) %>% gt()
n_Factors n_Methods
1 6
2 1
3 2
4 5
5 2
6 1
12 2
24 1
29 3
#n %>% tibble() %>% gt()

The scree plot shows the eignvalues associated with each factor:

fa.parallel(item_scores_B, fa = "fa")

## Parallel analysis suggests that the number of factors =  7  and the number of components =  NA

Based on this, there is clear support for a 1-factor solution. We also consider the 4-factor solution.

1 Factor

Show factanal output

fitfact <- factanal(item_scores_B, factors = 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A8_B7   A9_B8  A10_B9 A11_B10 
##    0.95    0.78    0.80    0.73    0.77    0.76    0.78    0.62    0.66    0.70 
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21 
##    0.71    0.88    0.83    0.78    0.86    0.66    0.60    0.61    0.62    0.73 
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29  A0_B11  A0_B26  A0_B30 
##    0.70    0.65    0.64    0.88    0.76    0.80    0.86    0.89    0.81    0.88 
##  A0_B31  A0_B32 
##    0.84    0.83 
## 
## Loadings:
##  [1] 0.52 0.62 0.58 0.55 0.54 0.58 0.63 0.62 0.62 0.52 0.54 0.59 0.60      0.47
## [16] 0.45 0.48 0.49 0.47 0.35 0.41 0.47 0.38 0.34 0.49 0.45 0.37 0.33 0.44 0.34
## [31] 0.41 0.41
## 
##                Factor1
## SS loadings       7.63
## Proportion Var    0.24
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 3991.49 on 464 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(test_versions %>% select(question = label, description), by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
question factor_loading description
A21_B18 0.6335602 (ln(sin))'
A22_B19 0.6229067 hyp.functions
A9_B8 0.6173382 trig.ineq.
A23_B20 0.6167307 slope tangent
A28_B24 0.5989062 Int = 0
A27_B23 0.5934801 int(exp)
A20_B17 0.5839634 (exp)'
A10_B9 0.5823615 trig.identity
A11_B10 0.5489394 period
A26_B22 0.5440885 int(poly)
A14_B12 0.5367772 limit
A5_B4 0.5215381 logs
A24_B21 0.5153658 IVT
A31_B27 0.4918262 int(abs)
A7_B6 0.4910048 graph translation
A6_B5 0.4757299 logs
A3_B2 0.4702819 arithmetic rules
A8_B7 0.4697829 sine pi/3
A17_B15 0.4670353 graph f'
A32_B28 0.4470902 FtoC algebra
A4_B3 0.4454821 Easy ineq.
A0_B26 0.4373808 FtoC graph
A16_B14 0.4142581 diff.quotient
A0_B32 0.4094848 scalar product
A0_B31 0.4058509 vector product
A18_B16 0.3802267 product rule
A33_B29 0.3700329 difference vectors
A15_B13 0.3531196 series
A0_B30 0.3422352 normal vector
A29_B25 0.3420163 int even funct
A0_B11 0.3327412 rational exponents
A1_B1 0.2246888 linear function

The questions that load most strongly on this factor are very standard calculations in integration, differentiation, and trigonometry – which is consistent with the aim of the test.

TODO - add comment saying it is disappointing that the new questions are all at the low end of this table?

4 Factor

Here we also investigate the 4-factor solution, to see whether these factors are interpretable.

Show factanal output

fitfact4 <- factanal(item_scores_B, factors = 4, rotation = "varimax")
print(fitfact4, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 4, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A8_B7   A9_B8  A10_B9 A11_B10 
##    0.90    0.73    0.73    0.71    0.75    0.72    0.76    0.58    0.64    0.69 
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21 
##    0.69    0.85    0.75    0.72    0.84    0.54    0.53    0.58    0.62    0.63 
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29  A0_B11  A0_B26  A0_B30 
##    0.55    0.61    0.63    0.83    0.72    0.76    0.81    0.88    0.72    0.85 
##  A0_B31  A0_B32 
##    0.79    0.75 
## 
## Loadings:
##         Factor1 Factor2 Factor3 Factor4
## A20_B17                  0.56          
## A26_B22          0.31    0.57          
## A1_B1                                  
## A3_B2    0.35    0.34                  
## A4_B3    0.31    0.40                  
## A5_B4    0.41                          
## A6_B5            0.32                  
## A7_B6            0.42                  
## A8_B7    0.37                          
## A9_B8    0.42    0.36            0.32  
## A10_B9   0.46                          
## A11_B10  0.34    0.36                  
## A14_B12  0.40                          
## A15_B13                                
## A16_B14  0.44                          
## A17_B15          0.43                  
## A18_B16                                
## A21_B18  0.44            0.47          
## A22_B19  0.40            0.41          
## A23_B20  0.39                          
## A24_B21          0.47            0.33  
## A27_B23  0.37            0.41          
## A28_B24  0.32                    0.34  
## A29_B25                  0.36          
## A31_B27                          0.33  
## A32_B28  0.33                          
## A33_B29          0.38                  
## A0_B11                                 
## A0_B26           0.39            0.33  
## A0_B30                                 
## A0_B31                           0.33  
## A0_B32                           0.41  
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings       2.82    2.54    2.20    1.57
## Proportion Var    0.09    0.08    0.07    0.05
## Cumulative Var    0.09    0.17    0.24    0.29
## 
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 1203.85 on 374 degrees of freedom.
## The p-value is 7.2e-88
load4 <- tidy(fitfact4)
ggplot(load4, aes(x = fl1, y = fl2)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = "Factor 2",
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

main_factors <- load4 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)

load4 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors, by = "variable") %>%
  left_join(test_versions %>% select(variable = label, description), by = "variable") %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor, # the _ throws a latex error
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
Main Factor Question fl1 fl2 fl3 fl4 description
fl1 A10_B9 0.456 0.276 0.229 0.157 trig.identity
A16_B14 0.44 0.03 0.131 0.206 diff.quotient
A9_B8 0.416 0.361 0.128 0.324 trig.ineq.
A5_B4 0.412 0.264 0.195 0.126 logs
A14_B12 0.403 0.225 0.277 0.13 limit
A23_B20 0.393 0.268 0.287 0.274 slope tangent
A8_B7 0.374 0.224 0.16 0.145 sine pi/3
A3_B2 0.346 0.34 0.175 0.022 arithmetic rules
A32_B28 0.326 0.082 0.208 0.288 FtoC algebra
A18_B16 0.269 0.098 0.193 0.2 product rule
fl2 A24_B21 0.137 0.47 0.136 0.334 IVT
A17_B15 0.152 0.43 0.139 0.23 graph f’
A7_B6 0.213 0.422 0.2 0.13 graph translation
A4_B3 0.309 0.4 0.057 0.085 Easy ineq.
A0_B26 0.115 0.388 0.086 0.333 FtoC graph
A33_B29 0.085 0.383 0.138 0.143 difference vectors
A11_B10 0.339 0.362 0.194 0.175 period
A6_B5 0.267 0.318 0.264 0.067 logs
A1_B1 0.025 0.28 0.143 -0.011 linear function
A15_B13 0.202 0.256 0.049 0.201 series
A0_B11 0.175 0.238 0.082 0.17 rational exponents
fl3 A26_B22 0.122 0.308 0.568 0.115 int(poly)
A20_B17 0.276 0.261 0.556 0.063 (exp)’
A21_B18 0.438 0.185 0.473 0.145 (ln(sin))’
A22_B19 0.402 0.197 0.414 0.22 hyp.functions
A27_B23 0.368 0.162 0.407 0.251 int(exp)
A29_B25 0.143 0.052 0.356 0.147 int even funct
A0_B30 0.093 0.135 0.26 0.232 normal vector
fl4 A0_B32 0.229 0.108 0.125 0.41 scalar product
A28_B24 0.322 0.287 0.26 0.343 Int = 0
A0_B31 0.112 0.178 0.246 0.328 vector product
A31_B27 0.293 0.283 0.097 0.326 int(abs)

TODO add comments from Meike here

The first factor is again calculations, but this time only in calculus (i.e. integrals and derivatives).

The second factor seems to be something like “abstract stuff”, it has to do with limits, rules for logarithms etc.

I guess that could be a category of its own.

The third one is interesting. It’s clearly graphical interpretations. All in different settings, but clearly belonging together.

And of the fourth factor I cannot make sense

About this report

This report supports the analysis in the following paper:

[citation needed]

Packages

In this analysis we used the following packages. You can learn more about each one by clicking on the links below.

  • mirt: For IRT analysis
  • psych: For factor analysis
  • tidyverse: For data wrangling and visualisation
  • reshape: For reshaping nested lists
  • vroom: For reading in many files at once
  • broom: For tidying model output
  • fs: For file system operations
  • gt: For formatting tables
  • knitr: For markdown tables
  • ggrepel: For labelling points without overlap
  • skimr: For data frame level summary
  • ggridges: For ridge plots